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ABSTRACT 

Evaluation problems evolving from certain objectives 
of intervention programs and from the characteristics o£ the target 
population are investigated* The intellectual process approach of 
experimental education with its emphasis on conative and motive 
objectives requires an evaluation technique capable of reflecting the 
desired behaviors more accurately, and of testing poor children 
adequately* Lover vernal ability and interpretation difficulties of 
poor children reduce the effectiveness of achievement tests for 
assessing the results of experimental education. The advantages of 
sit lational tests are noted and it is suggested that tests of this 
type are necessary for more adequate representation of the effects of 
intervention, given the characteristics of the target population. (Lh) 
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SITUATIONAL TESTS FOR EVALUATION OF INTERVENTION PROGRAMS: 

A POSITION 
Sadie A. Griramott 

Current educational programs designed to intervene into the econom- 
ically poor child's schooling experiences implicity or explicity pre- 
sume divergence from those schooling programs labeled "traditional^ 

(U. S. Office of Education, 1969). Assessment of the impact of this 
difference on performance levels is frequently by ex post facto de- 
signs comparing treated children (intervention participants ) to un- 
treated children (Light 6 Smith, 1970), most often by performance on 
standardized achievement tests (Jacobs 6 Felix, 1968). In addition 
to the increasing difficulty of specifying an untreated sample (Gordon, 
1970) and the biasing of the analytic procedures (Campbell 6 Erlebaeher, 
1970), achievement instrumentation is culpable for the assertions of 
failure (see Jensen, 1969) of the experimental educational models. 

Recent reviews (Baker, 1961*; Stake, 1968) indicate that a number of 
educational researchers have considered the statistical, reliability, 
and validity problems posed by comparative appraisal of curricula and 
instructional differences between experimental and nonexperimental 
programs. In this paper attention will be focused on evaluation per- 
plexities evolving from son'e of the objectives of intervention programs 
and some of the characteristics of the target population. An evaluation 
technique that has the potential for more accurate reflection of the 
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desired behaviors and more adequate accommodation to the attributes 
of poor children will be suggested. 

Objectives of Experimental Programs 

Experimental programs ostensibly differ among themselves for 
objectives and instructional methodology for attainment of the ob- 
jectives. However, many of them converge on emphasis of intellectual- 
process goals and the ancillary conative (skills) and motive objectives 
for behavioral attainment (Fowler, 1963; Klaus 6 Gray, 1968; Hughes, 
Wetzel, 6 Henderson, 1969). Bronfenbrenner (1969) and Zigler (1970) 
have strongly advocated the consideration of affective behavioral 
supports for successful achievement of intellectual socialization 
and as independent outcomes of schooling processes. In magnifying 
the importance of conative and motive behaviors as terminal outcomes, 
experimental intervention varies from nonexperimental approaches. 
Traditional education has consigned importance to conation and moti- 
vation; however, the commitment appears to be to these as facilitators 
of knowledge acquisition rather than to attainment of them per se (see 
King 6 Brownell, 1966; Ragan, 1966, pp. 59-64). 

Th; objectives of conation include the development of a variety 
of behaviors such as curiosity, gratification delay, risk taking, persis- 
tence, activity level, locus of control, information processing strate--- 
gies, self-esteem, and independence. There exists a paucity of stan- 
dardized instruments for testing performance outcomes of these objectives 
O (Gordon, 1958; Moughamian, 1965), resulting in under-representation in 
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evaluation of behaviors affected by experimental programs. Compensation 
of this dearth may necessitate construction of "criterion referenced 
items" related to these learning outcomes (Gagne, 1967; Glaser, 1963). 

The intellectual-process goals of experimental education denote 
preferential consideration of the structure of knowledge (Bruner, 1960) 
and meaningful learning as opposed to rote memorization (Ausubel, 1963; 
1368). For many interventionists, these goals reflect theoretical views 
purporting that the child is actively information-producing (see Hunt, 
1961). The instructional technologies are designed to induce information 
processing such as auditory and visual perception (Ueutsch, 1965), con- 
cept formation (Gray, Klaus, Miller, 6 Forrester, 1966; Sigel 6 Olmsted, 
1968), serial approximations to propositions (Resnick, 1963), and 
spatio-temporal relations (Sonquist 6 Kamii, 1368). Instructional 
strategies cf traditional approaches concentrate on terminal products 
utilizing rote memory processes (Bellack 6 Huebner, 1960). 

Process dimensions of intellectual performance have seldom been 
within the purview of achievement tests (Gordon, 1970; Jacobs 6 Felix, 
1968; Moughamian, 1965) which by definition are concerned with knowledge 
of facts or the products of learning (Humphreys, 1962). For instance, 
it is conceivable that two children, one attending an intervention 
program and one not, would give the same answer to an addition problem. 
One may determine his answer by counting his fingers and the other may 
derive his response by the algorithm--if , a + b = c; then, c - b = a. 
onceptually these individual derivations are manifestations of mental 
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processes. Current achievement evaluation does not elucidate these 
qualitative differences. That such qualitative cognitive processes 
ha vo benefits rather orthogonal to I.Q. levels is suggested by the 
findings of Olton and Crutchfield (1969). Similar results might 
occur for achievement abilities. Thus, in addition to limiting the 
potential effects of experimental education by nonsampling of cogni- 
tive facilitating behaviors, achievement tests misrepresent the ef- 
fects of these programs by only examining the results of learning. 
Meyer (n.d.) has asserted that evaluation of cognitive processes 
requires orientation to response dimensions not revealed in intel- 
lective product assessment. Emphases on process dimensions may de- 
mand an orientation to measurement similar to the one Elkind (1969) 
has adduced fc* Piagot. 

Characteristics of Poor C hil dren 

An adjunct to the impact of appraisal instruments on potential 
schooling effects is the disposition of the economically poor child. 
In order to elicit behavior indicative of "true" performance ability, 
the test cues or administrative instructions must consider the char- 
acteristics of the examinee. Reissman (1962) has declared that poor 
children have a better understanding of events than they display in 
verbally structured situations. Verbal demands necessitate an at- 
tending to and understanding of the emitted vords. Consequently, it 
can be adduced from this view that performance levels may be abated 
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oral instructions. 

Responses demanded on achievement tests are usually verbally cued 
to several children at the same time. However, implications from sev- 
eral studies are that aural stimulation is least effective for cueing 
responses. Two investigations have demonstrated higher learning for 
young children, tested individually, as a result of increased attend- 
ing provoked by three-dimensional stimuli (Rossi & Rossi, 1965; Trabasso, 
Deutsch, £ Gelman, 1966). Other findings have shown higher responses 
when instructions were vocalized and simultaneously demonstrated with 
concrete objects (Corsini, 1969; Rosenthal & Zir, merman , 1970), In 
the Rosenthal and Zimmerman study low socioeconomic status children 
were compared under verbal instructions accompanied by modeling versus 
verbal instructing only, with the former yielding greater performance 
in the test condition. These laboratory findings favoring demonstrated 
instructions agree with the ii i naturo observations reported by Meyer 
(1969). 

Research revealing a reliable association between auditory stimula- 
tion and decremental performance of poor children is an additional 
indication that these children are less likeLy to display effective 
performance when behavior is elicited orally (Deutsch, 1964; Gritnmett, 
1970a). Deutsch found that low auditory scores were associated with 
leading disability for black children. In a sample of Mexican-Ameri- 
cans, Grimmett found that the auditory reception group had a lower 
O srfortnance than an audiotactual group in free learning. 
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These data suggest that behavioral deficits of poor children may 
x'esult from disorganized orienting behaviors evolving from instructional 
presentation to a modality, physiologically functional, but rather in- 
sensitive to stimulation. Moreover, verbal instructions given to sev- 
eral children simultaneously probably decrease the signal-to-noise con- 
trast (i.e. more noise contrasted with signal) which might affect stim- 
ulus perception (Deutsch, 1964). Compensation for this aural deficit 
seems to require verbal demands *n association with stereometric stimuli. 

Another problem confronting test administration to poor children is 
the understanding of the response Headed for successful performance. It 
is possible that modality reception is above limen> but that imposition 
of meanings that differ from the intent of the directions result in defi- 
cient responding. Caldwell and Hall (1969) have demonstrated that when 
children apply a meaning consonant with thrt of the examiner, discrimi- 
nation performance is significantly improved. This concept understand- 
ing produced equivalent discrimination levels for kindergarten and Second 
grade children (Caldwell 6 Hall, 1970). By changing the syntax of in- 
structions, Etzel (1969) found that performance differences between a 
deprived and nondeprived group of children were attenuated. Famham- 
Diggory's (1970) results of higher verbal synthesis perfornance sub- 
sequent to pretraining, for trained versus untrained low income groups, 
provides other evidence supporting the import of concepts common to 
child and test directions for attainment of adequate response levels. 



O 




8 



Grimmett 



7 



Consonance of meaning can be obtained through pretraining preceding 
testing as shown by all of these studies. 

These data establish the need for test conditions which lessen 
the potential debilitating effects of oral reception and symbolic mis- 
interpretation# Because these responses seem mediated by the child's 
internal representational system, could lead to the argument that lan- 
guage deficits of the poor (Bereiter, 1965, Rankin 6 Henderson, 1968) 
contaminate performance level on other variables. Although one res- 
olution could be use of nonverbal tests relying primarily on visual 
stimulation, it seems unreasonable to suggest measurement of behavior 
by nonverbal tests in view of the complexity of vei"bal systems con- 
trolling actions. What seem more propitious, for provoking behavior 
compatible with the poor child's internal knowledge structures, are 
evaluation contexts containing objects, and warm-up in conjunction 
with correction feedback. 

Generalization of Responses 

Future achievement instrumentation may correct the neglect of 
organismic dispositions and intellectual-process and ancillary-skill 
assessment. Even with the occurrence of these developments, questions 
can be raised about the validity of paper and pencil devices for such 
evaluation on the basis of the discrepancy between the acquisition and 
testing conditions. Testing assumes some degree of generalization 
enabling performance in a situation '.nonidentical, but similar, to earlier 
O The kinds of conditions fostering or inhibiting transfer are 

ERIC 9 



Grimmett 



6 



seldom mentioned in instrument construction or research (Tyler, 1968), 
Moreover, item selection procedures for standardized instruments convey 
desultory attempts to consider the circumscribing environment during 
test administration. Essentially, test construction seems subtended 
by the notion that measurement incises permanent traits of the indi- 
vidual. That this position is tenuous is alluded to in the delibera- 
tions of a subcommittee of the Social Science Research Council. This 
Subcommittee on Compensatory Education (1970) resolved that "Learning > 
performance, attitudes, curiosity, etc. will not be thought of as char- 
acteristics which the child possesses independently of the setting in 
which they are manifested." Additional evidence supporting the unten- 
ability of presuming that discrepant acquisition and test circumstances 
are irrelevant to responding is a serendipitous finding of an in-progress 
study by Underwood (personal communication) at the Arizona Center for 
Early Childhood Education. She is developing a programmed procedure, 
using a mechanical training device, for left-right discrimination of 
letter orientation. The children are pretested on a paper and pencil 
device and are subsequently taught left-right orientation to criterion. 
Three children have achieved criterion but only one transferred this 
behavior to the paper and pencil posttest. This tentative finding un- 
deniably suggests that testing instrumentation affects response gen- 
eralization. 

Empirical data cn several of the cognitive supportive behaviors 
I i/ealed that response levels are influenced by the situation. For 



Grimmett 



9 



instance, the converse of delay of gratification, need of immediate 
reinforcement a has been found dependent upon social reinforcement sati- 
ation (Gewirtz, 1969). Gross and discreet stimuli have affected per- 
sisting behaviors (Grimmett, 1970a; Turnure, 1970). When subjected 
to different task demands, reliable changes in persistence have been 
manifested in association with task requirements (Allen, 1966. Wyer, 
1968). Stimulus redundancy has been shown a determinant of curiosity 
level (Berlyne 6 Frommer, 1966; Can cor, 1963; Dember 6 Earl, 1957; 

Smock 6 Holt, 1962). Conceptual categorical operations have been 
affected by the kind and representation of the stimuli (Glick 6 Wapner, 
1968; Szeminska, 1965). Implications from these data are that re- 
sponses are associated with the stimulus context; and since response 
generalization relies upon similarity of conditions (Ellis, 1965), 
it too is affected. 

Achievement tests, in general, have failed to consider conditions 
influencing generalization, the prepotent dispositions of poor child- 
ren, and the cognitive skill and process goals of experimental educa- 
tion# This ineffectualness of current standardized instruments for 
comparative curricula evaluation and for the poor population substan- 
tiates Meier f s (1967) declaration that ”the clarion call has been is- 
sued for fundamentally different evaluative techniques which appropri- 
ately assess... the quantity and quality of growth along newly con- 
ceived dimensions considered important for effective early childhood 
O ntervention procedures” (p. 176), 
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Researcher who are decrying current methodology are asserting the 
need to individualize testing, citing among others? Skinner T s (1953) 
techniques, as precedents (Stake, 1968). Gordon (1970) has urged the 
evaluation of intellectual processes and their eliciting conditions. 
Educible from these positions is the need for functional analysis of 
behavior, a thesis proposed by Bridgman (1954) for understanding be- 
havior not easily ascertained by verbal report and for comprehending 
behavior in association with its antecedent stimuli. Functional analysis 
of behavior is a procedure allowing a more direct discernment of the 
relationship between the response and its activating conditions. 
Methodology of the Laboratory 

According to Jensen (1961), the criteria which should be used 
in selecting a procedure for more direct appraisal of poor children's 
performance: (1) verbal ability in itself should not be critical in 

determining performance; (?) task demands should be equally comprehen- 
sible to children of different subcultures. The author would add a 
third criterion — the task properties should be sufficiently similar 
to the acquisition conditions so as to maximize generalization. These 
are the heuristics of functional analysis which, in turn, necessitate 
an environment capable of inducing intellective processes and cogni- 
tive facilitating behaviors. 

Recent literature contains many methodologies that satisfy the 
criterial restrictions while being predisposed to a functional analy^s 
of behavior. Perhaps the most renowned prpclairaer of t>ese kind of 
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techniques is Piaget (1964). He has developed the methode Clinique 
which comprises presenting stimuli to the child, with further reaction 
consequent to the child’s response. Piaget was more interested in 
how one derives an answer than in how many answers one knows (Flavell, 
1963) ■, therefore, he applied qualitative measures to test responses. 

A more standardized method for testing the behaviors of importance 
to Piaget has been developed uy Goldschmid and Bentler (1963). They 
have constructed a test which retains the qualitative appraisal of 
cognitive operations within a systematized technique allowing instru- 
mental responses. 

The Kendlcrs (Kendler € Kendler, 1967) have studied reasoning 
using a rather ingenious, yet simple, apparatus. These experimenta- 
lists systematically trained each necessary test response individually 
and evaluated the child's chaining of these responses to achieve suc- 
cess. Appropriate chaining denoted an inferential solution. The ab- 
sence of inference and the kind of errors were noted during unsuccess- 
ful attainment* 

Inference has also been studied by Bruner (1966), who used a dif- 
ferent stimulus context. He designated as strategy the behavioral chain 
that indexes inferring. The model for this procedure is the game, 

Twenty Cuestions, which can be efficiently solved by successive parti- 
tioning of th° stimuli, thereby increasing the specificity of solution. 
Many roodii I c&tiont of thi* procedure have occurred in the literature 
(Eimas, Grimmett, l970bi Tougas 6 Rowan t 1966). Eimas computed 
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the maximum number of chained questions to solution for the amount of 
stimuli he used and thereform distinguished directed thinkers and non- 
directed thinkers. 

Other well-established technologies appropriate to the study of 
intellect processes are the instrumental responses to pairs of shifts * 
(House 6 Zeaman, 1963), sorting stimuli (Sigel 6 Olmsted, 1968) > and 
oddity (Gollin 6 Shirk, 1966). 

A search of the research literature on cognitive facilitating be- 
haviors would reveal similar methods using degrees of stimulus repre- 
sentativeness of reality. Notable among these are the procedures of 
field independence-dependence by Witkin (Witkin, Dyk, Faterson, 6 
Goodenough, 1962), self-esteem by Bandura (1969), activity level by 
Maccob) vMaccoby, Dowley, Hagen, 6 Degerman, 1965), and curiosity by 
Berlyne (Berlyne 6 Frommer, 1966) and Mendel (1962). 

The commonalities of the procedures in these researchers are the 
utilisation of instrumental responses, the association of the response 
to the inducing context properties , and the assessment of the quality 
of the behavior. These procedures bring the stimulus context and the 
individual into a combinatorial relationship, a transaction, in which 
operations on the environment, as opposed to reactions to the environ- 
ment, have primacy. A stimulus context evidencing such characteristics 
has been identified as a situational test, a condition requiring an 
"actual adaptive response, rather than a mere 'test' response (English 
6 English, .1958, p. 504). Mandated is problem confrontation, the 
resolution of which has some relevance for the "real world/' 
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E valuation by Situational Tests 

Situational tests have most often been a tool of the laboratory 
psychologist. However* these kinds of tests are not new to measure- 
ment, as the well-known Stanford- Binet includes subtests that are situ- 
ational, Two recent test batteries have incorporated situational test 
techniques in varying degrees to appraise young children. One of these 
batteries is being developed by Meier (1967), He has stressed that 
the tests elicit overt responses known from schooling experiences and 
confront the child with familiar simulated tasks. Several of the tests 
are sufficiently similar to curriculum procedures that they could sub- 
stitute for learning activities. A television-type apparatus is the 
delivery system for most of the subtests. The perception test is situa- 
tional, measuring kinesthetic coordination and memory. 

The Cincinnati Autonomy Test Battery (Banta, 196 B) was constructed 
to sample problem-solving behaviors of children between the ages three 
and six. The battery consists of a series of situational tasks during 
which the child is given warm-up so as to assure his understanding of 
the test responses. Banta has stated, "The present tests are concerned 
with the ways in which a child solves a problem, not just his ability 
to perform a task 'correctly* (p. 3)," 

These tests and the previously mentioned test by Goldschmid and 
Bentler (1968) indicate efforts to extend measurement to skills usually 
neglected by achievement evaluation. It is notable that the developers 
employed situational tests designed to be similar to school learning 
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activities instead of paper and pencil devices. This is not to deny 
development of self-report instruments for measuring some conative and 
motivational behaviors (Maw S Maw, 1970; Penney S McCann, 1964 ; Soares 
6 Soares* 1969). However, if Katz's (1967) statement on the signifi- 
cance of Bandura's methodology for advancing techniques in the study 
of motivational phenomena can be generalized, then cautious pessimism 
may be expressed for the efficacy of paper and pencil measurement of 
conation and motivation, especially for young children. 

Discussion 

Because of the limitations of current instruments eliciting written 
responses of children as indices of cognitive skill repertoires, and 
the incapability of these instruements to measure cognitive processes, 
a reasonable alternative for alleviating these circumstances is situa- 
tional tests. In addition, numerous laboratory studies appraising 
cognitive facilitating behaviors attest to the amenability of situa- 
tional tasks for the study of these variables. On the basis of the 
evidence and interpretive statements of the needs of intervention pro- 
grams, the charateristics of poor children, and response generalization, 
situational tests are recommended as an evaluation approach. 

Components of a situational task involving cogitation are: 

acquisition or demonstration of the required response 
class; 

verification of comprehension of instructions; 
diminution of dependency on verbal behavior; 
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4. simulation of reality contexts- 

5. consecution of behavioral repertoire components* 

6. standardization of procedures for comparability. 

These facets connote the differences between situational tests and 
prosaic achievement tests. It is possible to educe from these com- 
ponents conceivable advantages for measurement by situational tests* 

One advantage of situational tasks is that the procedure can * 
account for the recent history of the examinee. By doing so, the re- 
quired response for success and comprehension of the instructions can 
be provided, constituting the proximal history. This means that a 
child who may have the requisite behavior is not failed because the 
syntactic structure of the instructions provoke a deep-structure dis- 
crepant with the semantic intent. By equating response adequacy and 
directional concepts, it is presumable that differential behavior is, 
in part, attributable to more distal experiences of which schooling 
is a factor. 

Another advantage of situational tests is facilitation of general- 
ization. Schooling activities require instrumental operations on phen- 
omena for acquisition of knowledge in many of the experimental programs 
Situational tests contain, by definition, assessment of adaptive re- 
sponses. Consequently, they potentiate testing contexts that are famil 
iar to the child. This similarity between the test-of- learning and 
acquisition activities should be conducive to positive transfer. Con- 
ti truction of achievement tests seems to dissociate acquisition und 
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Even though both kinds of testing (achievement and situation) 
assess behavioral elements, situational tests can specify the antece- 
dents inducive of the behavioral "bit’ 1 . This gives situational tasks 
a third advantage, that of allying behavior to a context through func- 
tional analysis. By associating the antecedent soliciting configuration 
with the response, one can detect with greater confidence the range 
and variation of operations employed by the children to attain solu- 
tions. These kinds of individual differences are confined to error 
variance in current psychometric procedures (Hendel 6 Weiss, 1968). 

Situational tasks as a measurement methodology are not without 
disadvantages. Some of these, such as subtle response influences 
emanating from sex and race of the experimenter, have been stipu- 
lated in a review of experimenter effects on various subcultures 

(Sattler, 1970), Other disadvantages are ^associated with the rel i- 

\ 

ability of the experimenter. And, an additfitoal source of difficulties 
is related to the effects of reinforcement and knowledge of results 
during testing. These parameters need clarification for the develop- 
ment of situational tests into an applicable technology for compara- 
tive educational evaluation. 

Achievement tests will, no doubt, continue to be of importance 
to education for the designation of a person’s status. However, ex- 
perimental intervention is demanding a different view of schooling, 
that of preventive remediation. To prevent decremental behaviors, 
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opposed to transmitting knowledge. Situational tests afford an oppor- 
tunity to assess these derivations as chained behaviors indicative of 
information processing in a context conducive to response generaliza- 
tion. The author contends that these kinds of tests are demanded for 
more adequate representation of the effects of intervention given the 
characteristics of the target population. 
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